discrimination score
Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks
de Seyssel, Maureen, Chi, Jie, Seto, Skyler, ter Hoeve, Maartje, Fedzechkina, Masha, Schluter, Natalie
We introduce a set of training-free ABX-style discrimination tasks to evaluate how multilingual language models represent language identity (form) and semantic content (meaning). Inspired from speech processing, these zero-shot tasks measure whether minimal differences in representation can be reliably detected. This offers a flexible and interpretable alternative to probing. Applied to XLM-R (Conneau et al, 2020) across pretraining checkpoints and layers, we find that language discrimination declines over training and becomes concentrated in lower layers, while meaning discrimination strengthens over time and stabilizes in deeper layers. We then explore probing tasks, showing some alignment between our metrics and linguistic learning performance. Our results position ABX tasks as a lightweight framework for analyzing the structure of multilingual representations.
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Wu, Junxi, Wang, Jinpeng, Liu, Zheng, Chen, Bin, Hu, Dongjian, Wu, Hao, Xia, Shu-Tao
The rapid advancement of large language models has intensified public concerns about the potential misuse. Therefore, it is important to build trustworthy AI-generated text detection systems. Existing methods neglect stylistic modeling and mostly rely on static thresholds, which greatly limits the detection performance. In this paper, we propose the Mixture of Stylistic Experts (MoSEs) framework that enables stylistics-aware uncertainty quantification through conditional threshold estimation. MoSEs contain three core components, namely, the Stylistics Reference Repository (SRR), the Stylistics-Aware Router (SAR), and the Conditional Threshold Estimator (CTE). For input text, SRR can activate the appropriate reference data in SRR and provide them to CTE. Subsequently, CTE jointly models the linguistic statistical properties and semantic features to dynamically determine the optimal threshold. With a discrimination score, MoSEs yields prediction labels with the corresponding confidence level. Our framework achieves an average improvement 11.34% in detection performance compared to baselines. More inspiringly, MoSEs shows a more evident improvement 39.15% in the low-resource case. Our code is available at https://github.com/creator-xi/MoSEs.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Asia > Sri Lanka (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (4 more...)
- Media > Film (1.00)
- Media > News (0.67)
- Leisure & Entertainment > Sports > Soccer (0.46)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques
Duong, Manh Khoi, Conrad, Stefan
In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application.
- North America > United States (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- Asia > Singapore (0.04)
- Asia > Malaysia > Penang (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Measuring and Mitigating Bias for Tabular Datasets with Multiple Protected Attributes
Duong, Manh Khoi, Conrad, Stefan
Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more challenging, as many existing methods are designed for a single protected attribute. This paper comes with a twofold contribution: Firstly, new discrimination measures are introduced. These measures are categorized in our framework along with existing ones, guiding researchers and practitioners in choosing the right measure to assess the fairness of the underlying dataset. Secondly, a novel application of an existing bias mitigation method, FairDo, is presented. We show that this strategy can mitigate any type of discrimination, including intersectional discrimination, by transforming the dataset. By conducting experiments on real-world datasets (Adult, Bank, Compas), we demonstrate that de-biasing datasets with multiple protected attributes is achievable. Further, the transformed fair datasets do not compromise any of the tested machine learning models' performances significantly when trained on these datasets compared to the original datasets. Discrimination was reduced by up to 83% in our experimentation. For most experiments, the disparity between protected groups was reduced by at least 7% and 27% on average. Generally, the findings show that the mitigation strategy used is effective, and this study contributes to the ongoing discussion on the implementation of the European Union's AI Act.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- (2 more...)
Improving Fairness in Credit Lending Models using Subgroup Threshold Optimization
Ying, Cecilia, Thomas, Stephen
In an effort to improve the accuracy of credit lending decisions, many financial intuitions are now using predictions from machine learning models. While such predictions enjoy many advantages, recent research has shown that the predictions have the potential to be biased and unfair towards certain subgroups of the population. To combat this, several techniques have been introduced to help remove the bias and improve the overall fairness of the predictions. We introduce a new fairness technique, called \textit{Subgroup Threshold Optimizer} (\textit{STO}), that does not require any alternations to the input training data nor does it require any changes to the underlying machine learning algorithm, and thus can be used with any existing machine learning pipeline. STO works by optimizing the classification thresholds for individual subgroups in order to minimize the overall discrimination score between them. Our experiments on a real-world credit lending dataset show that STO can reduce gender discrimination by over 90\%.
- North America > Canada > Ontario > Kingston (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Banking & Finance (1.00)
- Law > Civil Rights & Constitutional Law (0.68)
- Information Technology > Security & Privacy (0.46)
Evaluating and Mitigating Discrimination in Language Model Decisions
Tamkin, Alex, Askell, Amanda, Lovitt, Liane, Durmus, Esin, Joseph, Nicholas, Kravec, Shauna, Nguyen, Karina, Kaplan, Jared, Ganguli, Deep
As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at https://huggingface.co/datasets/Anthropic/discrim-eval
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Kentucky (0.04)
- (4 more...)
- Health & Medicine (1.00)
- Government (1.00)
- Banking & Finance (1.00)
- (2 more...)
Differentiating Student Feedbacks for Knowledge Tracing
In computer-aided education and intelligent tutoring systems, knowledge tracing (KT) raises attention due to the development of data-driven learning methods, which aims to predict students' future performance given their past question response sequences to trace their knowledge states. However, current deep learning approaches only focus on enhancing prediction accuracy, but neglecting the discrimination imbalance of responses. That is, a considerable proportion of question responses are weak to discriminate students' knowledge states, but equally considered compared to other discriminative responses, thus hurting the ability of tracing students' personalized knowledge states. To tackle this issue, we propose DR4KT for Knowledge Tracing, which reweights the contribution of different responses according to their discrimination in training. For retaining high prediction accuracy on low discriminative responses after reweighting, DR4KT also introduces a discrimination-aware score fusion technique to make a proper combination between student knowledge mastery and the questions themselves. Comprehensive experimental results show that our DR4KT applied on four mainstream KT methods significantly improves their performance on three widely-used datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Certifying Fairness of Probabilistic Circuits
Selvam, Nikil Roashan, Broeck, Guy Van den, Choi, YooJung
With the increased use of machine learning systems for decision making, questions about the fairness properties of such systems start to take center stage. Most existing work on algorithmic fairness assume complete observation of features at prediction time, as is the case for popular notions like statistical parity and equal opportunity. However, this is not sufficient for models that can make predictions with partial observation as we could miss patterns of bias and incorrectly certify a model to be fair. To address this, a recently introduced notion of fairness asks whether the model exhibits any discrimination pattern, in which an individual characterized by (partial) feature observations, receives vastly different decisions merely by disclosing one or more sensitive attributes such as gender and race. By explicitly accounting for partial observations, this provides a much more fine-grained notion of fairness. In this paper, we propose an algorithm to search for discrimination patterns in a general class of probabilistic models, namely probabilistic circuits. Previously, such algorithms were limited to naive Bayes classifiers which make strong independence assumptions; by contrast, probabilistic circuits provide a unifying framework for a wide range of tractable probabilistic models and can even be compiled from certain classes of Bayesian networks and probabilistic programs, making our method much more broadly applicable. Furthermore, for an unfair model, it may be useful to quickly find discrimination patterns and distill them for better interpretability. As such, we also propose a sampling-based approach to more efficiently mine discrimination patterns, and introduce new classes of patterns such as minimal, maximal, and Pareto optimal patterns that can effectively summarize exponentially many discrimination patterns
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Arizona (0.04)
Discrimination and Class Imbalance Aware Online Naive Bayes
Badar, Maryam, Fisichella, Marco, Iosifidis, Vasileios, Nejdl, Wolfgang
Fairness-aware mining of massive data streams is a growing and challenging concern in the contemporary domain of machine learning. Many stream learning algorithms are used to replace humans at critical decision-making points e.g., hiring staff, assessing credit risk, etc. This calls for handling massive incoming information with minimum response delay while ensuring fair and high quality decisions. Recent discrimination-aware learning methods are optimized based on overall accuracy. However, the overall accuracy is biased in favor of the majority class; therefore, state-of-the-art methods mainly diminish discrimination by partially or completely ignoring the minority class. In this context, we propose a novel adaptation of Na\"ive Bayes to mitigate discrimination embedded in the streams while maintaining high predictive performance for both the majority and minority classes. Our proposed algorithm is simple, fast, and attains multi-objective optimization goals. To handle class imbalance and concept drifts, a dynamic instance weighting module is proposed, which gives more importance to recent instances and less importance to obsolete instances based on their membership in minority or majority class. We conducted experiments on a range of streaming and static datasets and deduced that our proposed methodology outperforms existing state-of-the-art fairness-aware methods in terms of both discrimination score and balanced accuracy.
- North America > United States (0.46)
- Europe > Germany > Lower Saxony (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Health & Medicine (1.00)
- Government > Regional Government (1.00)
- Banking & Finance (0.86)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Revealing Unfair Models by Mining Interpretable Evidence
Bajaj, Mohit, Chu, Lingyang, Romaniello, Vittorio, Singh, Gursimran, Pei, Jian, Zhou, Zirui, Wang, Lanjun, Zhang, Yong
The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications, such as justice system, drug/vaccination design, and medical diagnosis. Although there are effective methods to train fair models from scratch, how to automatically reveal and explain the unfairness of a trained model remains a challenging task. Revealing unfairness of machine learning models in interpretable fashion is a critical step towards fair and trustworthy AI. In this paper, we systematically tackle the novel task of revealing unfair models by mining interpretable evidence (RUMIE). The key idea is to find solid evidence in the form of a group of data instances discriminated most by the model. To make the evidence interpretable, we also find a set of human-understandable key attributes and decision rules that characterize the discriminated data instances and distinguish them from the other non-discriminated data. As demonstrated by extensive experiments on many real-world data sets, our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models. Moreover, it is much more scalable than all of the baseline methods.
- North America > Canada > Ontario > Hamilton (0.46)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- (2 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
- Law > Criminal Law (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)